Reranking Translation Candidates Produced by Several Bilingual Word Similarity Sources
نویسندگان
چکیده
We investigate the reranking of the output of several distributional approaches on the Bilingual Lexicon Induction task. We show that reranking an n-best list produced by any of those approaches leads to very substantial improvements. We further demonstrate that combining several n-best lists by reranking is an effective way of further boosting performance.
منابع مشابه
Bilingual lexicon extraction from comparable corpora for closely related languages
In this paper we present a knowledge-light approach to extract a bilingual lexicon for closely related languages from comparable corpora. While in most related work an existing dictionary is used to translate context vectors, we take advantage of the similarities between languages instead and build a seed lexicon from words that are identical in both languages and then further extend it with co...
متن کاملFinding Translation Candidates from Patent Corpus
This paper describes a method for retrieving technical terms and finding their translation candidates from patent corpora. The method improves the reliability of bilingual seed words that measure similarity between a target word and its translation candidates. We conducted an experiment with PAJ (Patent Abstracts of Japan), which is a collection of bilingual patent abstracts written in Japanese...
متن کاملBilingual Dictionary Construction with Transliteration Filtering
In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine tra...
متن کاملUAlacant word-level machine translation quality estimation system at WMT 2015
This paper describes the Universitat d’Alacant submissions (labelled as UAlacant) for the machine translation quality estimation (MTQE) shared task in WMT 2015, where we participated in the wordlevel MTQE sub-task. The method we used to produce our submissions uses external sources of bilingual information as a black box to spot sub-segment correspondences between a source segment S and the tra...
متن کاملModèle de traduction statistique à fragments enrichi par la syntaxe. (A Syntax-Augmented Phrase-Based Statistical Machine Translation Model)
Traditional Statistical Machine Translation models are not aware of linguistic structure. Thus, target lexical choices and word order are controlled only by surface-based statistics learned from the training corpus. Knowledge of linguistic structure can be beneficial since it provides generic information compensating data sparsity. The purpose of our work is to study the impact of syntactic inf...
متن کامل